A common type system for clinical natural language processing

نویسندگان

  • Stephen T. Wu
  • Vinod Kaggal
  • Dmitriy Dligach
  • James J. Masanz
  • Pei J. Chen
  • Lee Becker
  • Wendy W. Chapman
  • Guergana K. Savova
  • Hongfang Liu
  • Christopher G. Chute
چکیده

UNLABELLED BACKGROUND One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. RESULTS We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. CONCLUSIONS We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

Assessing clinical trial eligibility with logic expression queries

This paper introduces a system that processes clinical trials using a combination of natural language processing and database techniques. We process web-based clinical trial recruitment pages to extract semantic information reflecting eligibility criteria for potential participants. From this information we then formulate a query that can match criteria against medical data in patient records. ...

متن کامل

سیستم شناسایی و طبقه‌بندی موجودیت‌های اسمی در متون زبان فارسی بر پایه شبکه عصبی

Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Integrating a Natural Language Message Pre-Processor with UIMA

This paper describes the use of the Unstructured Information Management Architecture (UIMA) to integrate a set of natural language processing (NLP) tools in the RADAR system. The challenge was to define a common data model and a set of component interfaces for these tools, so that they could be integrated into a single system. The integrated system is used to pre-process each email arriving in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2013